An improved speaker diarization system

نویسندگان

  • Rong Fu
  • Ian D. Benest
چکیده

This paper describes an automatic speaker diarization system for natural, multi-speaker meeting conversations. Only one central microphone is used to record the meeting. The new system is robust to different acoustic environments it requires neither pre-training models nor development sets to initialize the parameters. The new system determines the model complexity automatically. It adapts the segment model from a universal background model, and uses the cross-likelihood ratio instead of the Bayesian Information Criterion (BIC) for merging. Finally it uses an intra-cluster/inter-cluster ratio as the stopping criterion. Together this reduces the speaker diarization error rate from 21.76% to 17.21% compared with the baseline system [1].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modulation spectrogram features for improved speaker diarization

We propose the use of modulation spectrogram features in speaker diarization. These features carry longer term characteristics of the acoustic signals than the widely used MFCCs, thus providing potential improvement by using both features in combination. Using the state-of-the-art ICSI speaker diarization system, an improvement of 20.77% relative DER is obtained on the NIST Rich Transcription 2...

متن کامل

Bayes factor based speaker segmentation for speaker diarization

This paper proposes the use of the Bayes Factor as a distance metric for speaker segmentation within a speaker diarization system. The proposed approach uses a pair of constant sized, sliding windows to compute the value of the Bayes Factor between the adjacent windows over the entire audio. Results obtained on the 2002 Rich Transcription Evaluation dataset show an improved segmentation perform...

متن کامل

A comparison of neural network feature transforms for speaker diarization

Speaker diarization finds contiguous speaker segments in an audio stream and clusters them by speaker identity, without using a-priori knowledge about the number of speakers or enrollment data. Diarization typically clusters speech segments based on short-term spectral features. In prior work, we showed that neural networks can serve as discriminative feature transformers for diarization by tra...

متن کامل

Improving Speaker Diarization

This paper describes the LIMSI speaker diarization system used in the RT-04F evaluation. The RT-04F system builds upon the LIMSI baseline data partitioner, which is used in the broadcast news transcription system. This partitioner provides a high cluster purity but has a tendency to split the data from a speaker into several clusters when there is a large quantity of data for the speaker. In th...

متن کامل

Speaker diarization of spontaneous meeting room conversations

Speaker diarization is the task of identifying “who spoke when” in an audio stream containing multiple speakers. This is an unsupervised task as there is no a priori information about the speakers. Diagnostical studies on state-of-the-art diarization systems have isolated three main issues with the systems; overlapping speech, effects of background noise and speech/nonspeech detection errors on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007